Introduction

The following is a brief linguistic analysis of the use of racially charged language in William Faulkner’s Absalom, Absalom!. Faulkner’s representation of race was complicated, just as his own his relationship with race was complex. As a Southern white moderate, he voiced his anguish over the dehumanization of African Americans under Jim Crow segregation, and at the same time could also casually refer to people as “niggers” during the public retelling of a comic story. Indeed, there is no shortage of literature on Faulkner and race in general, and with regards to Absalom, Absalom! in particular. Given this extensive critical history, it almost goes without saying, that a computational analysis of word choice, especially with regard to racially charged language, cannot due justice to the complexities and nuances of either the text or Faulkner’s broader critical intervention. Nevertheless, using techniques common in corpus linguistics (CL) it is possible to give a birds-eye view of how the use of certain words is patterned, this pattern can then, in turn, inform subsequent close readings.

The following piece uses several techniques available to standard CL analysis, and one more complex analysis that is exclusively available to practitioners who have access to the Digital Yoknapatawpha data set. These different techniques have been split into different parts.

All of the data was generated using the R programming language using the tidyverse suite of packages. The full repository is available at https://github.com/joostburgers/absalom_sentiment_analysis Due to copyright issues the repository does not include the Absalom, Absalom text file used for data analysis.

Part 1: Statistical Overview Absalom, Absalom!

Text pre-processing

With any textual analysis, some pre-processing is required. The steps that follow are standard procedures in CL. The text of Absalom, Absalom! was read in as a txt file. It was then broken into nine chapters, and further sub-setted into sentences. The individual words were subsequently “tokenized.” The process of tokenization removes capital letters, special characters, and punctuation. It enables the computer to compare words more easily. Each “stop word” was then removed. These are words like: the, a, on, at, etc. that are very frequent with in any text, and do not add to the analysis. The words were then lemmatized. Lemmatization reduces a word to the word stem. For example, Negroes becomes Negro. This way all instances of the concept “Negro” are unified as one instances. This prevents creating separate counts for words like Negro, Negroes, and Negro’s.

The resulting slate of words was tagged as either racially charged by adding a column called race_word and indicating TRUE or FALSE for each word. This was done by creating a list of racial words and joining it to the data table through a left sided join. Essentially, it checks to see any time a word like “Negro”, “Nigger”, or “Octoroon” occurs and tags it as TRUE. With this pre-processing complete it is possible to provide some key statistical insights.

Word Frequency

The chart below shows the ten most frequent non-racial words and racial words in the text. Hovering over the the individual bars reveals their precise number, and clicking on TRUE and FALSE turns that particular series on and off.

Figure 1: The chart displays the most frequent words based on word stem.<br> This prevents counting 'father' and 'father's' separately.

What is immediately noticeable is that the word “nigger” is the most frequent racial term. It exceeds the word “negro” by 50 counts. It occurs about a third as infrequently as the word Henry (the main character) and twice as infrequently as the racially ambigious Charles Bon. Importantly, the occurrences of the individual names of characters is not the same as the number of times they actually occur in the text. After all, the pronouns “he” or “she” could equally well denote a character, but that is not shown here.

Collocations

Collocation is a process of determining what words appear together. This is done through a process of creating n-grams, where n is the number of words that might match in a sequence. By determining the n-gram around particular words, we can get a better sense of the context. For example, her research of British Newspapers, Dawn Archer has shown that the most common bigram (n-gram of two) for Muslim is “Muslim terrorist”, certainly this strong association between these two words indicates how Muslim’s are represented in the British media.

Figure 1: This plot shows the most common cooccurrence of racial language in Absalom, Absalom!

The phrase that stands out the most is one that Rosa Coldfield uses early on “wild niggers.” It becomes a leitmotif for much of the text and the phrase will be repeated throughout. Yet, who repeats it and how it is repeated will change.

Figure 3: The above chart shows the respective use of the phrases 'Wild Nigger' and 'Wild Negro'. It was created by manually searching for the terms in the text and recovering the speaker.

In their use of either “wild niggers” or “wild negro”, Quentin and Rosa Coldfield share an inverse relationship. This is curious because it is Rosa who so jarringly refers to the demonic Sutpen arriving in Yoknapatawpha:

Out of quiet thunderclap he would abrupt (man-horse-demon) upon a scene peaceful and decorous as a >schoolprize water color, faint sulphur-reek still in hair clothes and beard, with grouped behind >him his band of wild niggers like beasts half tamed to walk upright like men, in attitudes wild >and reposed, and manacled among them the French architect with his air grim, haggard, and >taller-ran.